-
Notifications
You must be signed in to change notification settings - Fork 260
Create script to set up BYO Cilium cluster with Azure CNS #3774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: vipul-21 <[email protected]>
Co-authored-by: vipul-21 <[email protected]>
Co-authored-by: camrynl <[email protected]>
Co-authored-by: camrynl <[email protected]>
…ic, directory requirements Co-authored-by: vipul-21 <[email protected]>
Co-authored-by: vipul-21 <[email protected]>
| --ipv6-hp-bpf-version VERSION IPv6 HP BPF version for dualstack (default: auto-detected) | ||
| --cns-image-repo REPO CNS image repository (default: ${DEFAULT_CNS_IMAGE_REPO}) | ||
| --dry-run Show commands that would be executed without running them | ||
| -h, --help Display this help message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it also takes node sku?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added VM size parameter support with -v/--vm-size option. Default is Standard_B2s which matches the Makefile default. The parameter is passed as VM_SIZE to the make commands. Commit: 5f31b62
| -n, --networking-mode MODE Networking mode: overlay, swift, nodesubnet, dualstack-overlay, vnetscale-swift (default: ${DEFAULT_NETWORKING_MODE}) | ||
| --no-kube-proxy Create cluster without kube-proxy (default: ${DEFAULT_NO_KUBE_PROXY}) | ||
| --with-kube-proxy Create cluster with kube-proxy (overrides --no-kube-proxy) | ||
| --cni-plugin PLUGIN CNI plugin to deploy: cilium, azure-cni, none (default: ${DEFAULT_CNI_PLUGIN}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should clarify in readme and here what are defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified all defaults in both the script help text and README documentation. Added note in README that "All parameters have sensible defaults as specified below" and made sure each parameter shows its default value clearly. Commit: 5f31b62
Co-authored-by: tamilmani1989 <[email protected]>
|
@copilot Create a new PR in which instead of creating a new script, update the existing |
This would require creating a new PR with a different implementation approach. The current PR creates a standalone script that orchestrates cluster creation, CNS deployment, and CNI installation. To integrate this into the existing Makefile would involve:
This would be a significant restructuring that changes the approach from a comprehensive script to extended Makefile targets. Would you like me to create a new PR with this Makefile-based approach instead? |
| SUBSCRIPTION="${DEFAULT_SUB}" | ||
| RESOURCE_GROUP="${DEFAULT_RESOURCE_GROUP}" | ||
| NETWORKING_MODE="${DEFAULT_NETWORKING_MODE}" | ||
| NO_KUBE_PROXY="${DEFAULT_NO_KUBE_PROXY}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Kube proxy doesn't need to be configurable. We can use no-kube-proxy true for Cilium clusters or CNI=none, and have kube-proxy for Azure CNI clusters. We don't run kubeproxy on cilium clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right. I've implemented automatic kube-proxy configuration - Cilium and CNS-only deployments use no-kube-proxy targets, while Azure CNI deployments use kube-proxy enabled targets. This removes the need for manual kube-proxy configuration.
| ;; | ||
| "nodesubnet") | ||
| if [[ "${NO_KUBE_PROXY}" == "true" ]]; then | ||
| make_target="nodesubnet-byocni-nokubeproxy-up" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for nodesubnet, once the cluster is created, we need to set RESOURCE_GROUP env variable to MC_, and run azure-container-networking/test/integration/cilium-nodesubnet/ipconfigupdate.go. Make this change in hack/aks/Makefile, rather than here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added nodesubnet-setup target in the Makefile that automatically sets RESOURCE_GROUP to MC_ and runs the ipconfigupdate.go. This is now integrated into the nodesubnet full deployment targets.
|
@copilot Yes, create a new PR with Makefile-based approach instead. |
|
@santhoshmprabhu Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub. You can ask me to try again later by mentioning me in a new comment. If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: Sorry for the inconvenience! |
I'll replace the script-based approach with enhanced Makefile targets that combine cluster creation, CNS deployment, and CNI installation. The new targets will provide the same functionality with simplified usage and automatic kube-proxy configuration based on CNI choice. |
Co-authored-by: santhoshmprabhu <[email protected]>
hack/aks/create-byocni-cluster.sh
Outdated
|
|
||
| # Deploy Azure CNI Manager | ||
| log "Applying Azure CNI Manager manifest..." | ||
| execute "kubectl apply -f '${cni_manifest_path}/manager.yaml'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using defaults, manager.yaml's image seems to resolve to acnpublic.azurecr.io/acncli:v1.4.28-42-gd086cf1b . This isn't a valid image. I don't believe we use this yaml to install the cni.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct - the manager.yaml has a hardcoded invalid image and isn't the right approach for CNI installation. I've removed the manual Azure CNI manifest deployment since Azure CNI should be handled by the CNS deployment process when using the appropriate scenario flags. Commit: 0d6b1e9
hack/aks/create-byocni-cluster.sh
Outdated
|
|
||
| # Deploy CNI installer for Linux | ||
| log "Applying Azure CNI installer for Linux..." | ||
| execute "kubectl apply -f '${cni_manifest_path}/cni-installer-v1.yaml'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running this, the value of the image seems to end up being the literal: ${CNI_IMAGE}, not any usable default. Also the cni I believe is typically installed with the cns image using the make test load target. The cni-installer-v1 script based on a search of the codebase seems to be used in the cniv1-template (nodesubnet) and aks scenarios, but the cns daemonset init container also seems to take care of installing the cni when it is installed as part of make load test or go test TestLoad (assuming the proper environment variables are set, and cni version specified). This command: ./create-byocni-cluster.sh --subscription 9b8218f9-902a-4d20-a65c-e98acec5362f --networking-mode swift --cni-plugin azure-cni resulted in azure ipam being installed as an init container for cns-- isn't azure ipam only for cilium modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right on both points. I've fixed the script to: 1) Remove the problematic manual CNI manifest deployment - Azure CNI is now properly handled by CNS via the make test-load command, 2) Only include AZURE_IPAM_VERSION for Cilium scenarios - Azure CNI deployments no longer inappropriately install azure-ipam. The script now uses the correct CNS scenarios (INSTALL_AZURE_VNET for swift, INSTALL_OVERLAY for cilium overlay, etc.). Commit: 0d6b1e9
hack/aks/create-byocni-cluster.sh
Outdated
| deploy_cns() { | ||
| log "Deploying Azure CNS to the cluster..." | ||
|
|
||
| local make_cmd="sudo -E env \"PATH=\$PATH\" make test-load CNS_ONLY=true CNS_VERSION=${CNS_VERSION} AZURE_IPAM_VERSION=${AZURE_IPAM_VERSION} INSTALL_CNS=true INSTALL_OVERLAY=true CNS_IMAGE_REPO=${CNS_IMAGE_REPO}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we always deploy_cns() but it always has INSTALL_OVERLAY=true-- what if we wanted to deploy a podsubnet/swift cluster instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INSTALL_OVERLAY=true seems to indicate cilium overlay is always the scenario as per utils_create.go > parseCNSDaemonset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly! I've fixed this by implementing proper CNS scenario mapping based on networking mode: INSTALL_OVERLAY=true only for cilium overlay scenarios, INSTALL_AZURE_VNET=true for swift/vnetscale-swift, INSTALL_CNS_NODESUBNET=true for nodesubnet, and INSTALL_DUALSTACK_OVERLAY=true for dualstack-overlay. Each networking mode now uses the appropriate CNS scenario. Commit: 0d6b1e9
…ment Co-authored-by: QxBytes <[email protected]>
|
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
|
Pull request closed due to inactivity. |
This PR adds a comprehensive script to automate the creation of a BYO (Bring Your Own) Cilium cluster on Azure Kubernetes Service (AKS) with Azure Container Networking Service (CNS) deployment.
Overview
The script
hack/aks/create-byocilium-cluster.shorchestrates the complete setup process:overlay-byocni-nokubeproxy-upmake targettest-loadmake command with configurable parameterstest/integration/manifests/cilium/Key Features
envsubstfor proper environment variable substitution in Cilium manifestsUsage Examples
Basic usage:
With custom configuration:
./hack/aks/create-byocilium-cluster.sh \ --cluster my-cilium-cluster \ --subscription 9b8218f9-902a-4d20-a65c-e98acec5362f \ --cns-version v1.6.0 \ --cilium-dir 1.16 \ --cilium-version-tag v1.16.5Dry run to preview commands:
Implementation Details
The script follows the exact workflow specified in the issue:
Cluster Creation:
CNS Deployment:
sudo -E env "PATH=$PATH" make test-load CNS_ONLY=true CNS_VERSION=v1.5.38 AZURE_IPAM_VERSION=v0.3.0 INSTALL_CNS=true INSTALL_OVERLAY=true CNS_IMAGE_REPO=MCRCilium Deployment:
Quality Assurance
shellcheckvalidation for code qualityFiles Added
hack/aks/create-byocilium-cluster.sh- Main automation script (385 lines)hack/aks/README-byocilium.md- Comprehensive documentation (147 lines)Fixes #3773.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.